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Abstract. As was noted already by A. N. Kolmogorov, any random variable 
has a Bernoulli component. This observation provides a tool for the extension 
of results which are known for Bernoulli random variables to arbitrary distri- 
butions. Two applications are provided here: i. an anti-concentration bound 
for a class of functions of independent random variables, where probabilistic 
bounds are extracted from combinatorial results, and ii. a proof, based on the 
Bernoulli case, of spectral localization for random Schrodinger operators with 
arbitrary probability distributions for the single site coupling constants. For a 
general random variable, the Bernoulli component may be defined so that its 
conditional variance is uniformly positive. The natural maximization problem 
is an optimal transport question which is also addressed here. 
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1. Introduction 

This article has a twofold purpose. As a general observation it is noted that in 
any random variable one may find a Bernoulli component. A decomposition which 
is based on the above observation allows then to extend results which for systems 
of Bernoulli variables are available by combinatorial methods to systems of random 
variables of arbitrary distribution. 

A Bernoulli decomposition of a real- valued random variable X is a representation 
of the form 

X ^ Y{t) + 5{t)r^, (1.1) 

where Y{-) and 5{-) > are functions on (0,1), the variable t is uniformly dis- 
tributed in (0,1), and rj is a Bernoulli random variable taking the values {0,1} 
with probabilities {1 — p,p} independently of t. The relation in p.ip is to be un- 
derstood as expressing equality of the distributions of the corresponding random 
variables. 

Bernoulli decompositions are constructed here for arbitrary random variables 
of non-degenerate distributions. For certain purposes it is useful to have positive 
uniform conditional variance of the Bernoulli term, i.e., 

inf 5{t) > 0. (1.2) 
te(o,i) 

We present such a representation below and discuss related issues of optimality. 

Two applications mentioned here: i. anti-concentration bounds for monotone, 
though not necessarily linear, functions of independent random variables, and ii. 
a proof, based on the Bernoulli case [BKj . of spectral localization for random 
Schrodinger operators with arbitrary probability distributions for the single site 
coupling constants. 

In the first application, we consider functions ^(Xi,...,X^) of independent 
non-degenerate random variables {Xj} whose distributions are either identical or, 
in a sense explained below, are of widths greater than some common bx > 0. It is 
shown here that if for some e > the function satisfies 

$(m + wcj) - $(m) > e (1.3) 

for all w > 6x, all It e K^, and j = 1, . . . , A^, where ej is the unit vector in the 
j-direction, then the following concentration bound applies: 

supP({<I'(Ai,...,AAr)e [x,x-He]}) < (1.4) 

x£R V A 

with a constant Cx < oo which depends on the uniform bounds on the distri- 
butions of {Aj}. The proof employs the Bernoulli representation along with the 
combinatorial bounds of Sperner [5] , and the more general LYM lemma [E] . 

The use of combinatorial estimates for concentration bounds first appeared in 
the context of Bernoulli variables in P. Erdos' variant of the Littlewood-Offord 
Lemma |Er| . The presence of a Bernoulli component in any random variable was 
noted implicitly in the work of A. N. Kolmogorov [Ko] where it was put to use in an 
improvement of the earlier concentration bounds of W. Doeblin and P. Levy [DoLl 
IDo] on linear functions of independent random variables. Initially, Kolmogorov did 
not extract the maximal benefit from the method by not connecting it with Sperner 
theory, and in particular the concentration bound in [Koj includes an unnecessary 
logarithmic factor; the corresponding improvement was made by B. A. Rogozin |Rlj . 
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The bounds were further improved in a series of works, in particular [Esl [Kl [R2] 
where use was also made of other methods. One may note here that perhaps 
quite naturally a general method like the Bernoulli decomposition is not optimized 
for specific applications. Nevertheless, it has the benefit of providing a simple 
perspective on a number of topics. 

In our second application, we establish spectral localization for a broad class 
of continuum, alloy- type random Schrodinger operators (cf. (|4.ip ). building on a 
result of J. Bourgain and C. Kenig [BKJ for the Bernoulli case. The model and 
the results are presented more explicitly in Section |31 The main point to be made 
here is that the understanding of spectral localization for the Bernoulli case can 
be extended through the Bernoulli decomposition to random operators with single 
site coupling parameters of arbitrary distribution (cf. Theorem 14. 2|) . 

2. Bernoulli decompositions for random variables 

Randomness often is in the eyes of the beholder, as probability measures are used 
to express averages over specified sets of rather varied nature. However, it may be 
true that the most elementary model underlying the basic popular perception of 
probability is the simple 'coin toss', with two possible outcomes: heads or tails, 
which is modeled by a Bernoulli random variable: a binary variable equal to 1 with 
probability p and equal to with probability I — p. 

2.1. The decomposition in two variants. The following statement assert that 
any real valued random variable has a Bernoulli component, which can even be 
chosen to be of uniformly positive variance. 

Given a real random variable X by default we shall denote its probability dis- 
tribution by n and let G : (0, 1) —^ (—00,00) be the function defined by 

G{t) := M{ueR : {{-00, u])>t} . (2.1) 

One may observe that G is the 'inverse' distribution function of fi, which takes 
values in the essential range of X. It can alternatively be described by 

G(t)<u fi {{-00, u])>t, (2.2) 

and satisfies ^ ((—00, G{t) — e]) <t < ji ((—00, G{t)]) for alH G R and e > 0. 

Theorem 2.1. Let X he a non- degenerate real-valued random variable with a prob- 
ability distribution fi. Then, for each p G (0, 1), X admits a decomposition of the 
form: 

X ^ Yp{t) + S+{t) Tj, (2.3) 
in the sense of equality of the corresponding probability distributions, where: 

(1) r] and t are independent random variables, with r/ a binary variable taking 
the values {0,1} with probabilities {l—p,p}, correspondingly, andt having 
the uniform distribution in (0,1), 

(2) Yp : (0, 1) ^ (—00,00) is the monotone non- decreasing function 

Ypit) := Giil-p)t), (2.4) 

(3) 5p : (0, 1) I— > [0, 00) is the function 

J+(<) :- G{l-p + pt)-G{{l-p)t), (2.5) 
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(4) for at least one value of p e (0, 1) we have 

/3+(p,m) inf ,5+(t)>0. (2.6) 

te(o,i) 

Some explicit expressions for (3'^{p,fi) are mentioned in Remark 12.11 below . The 
Bernoulli component of the measure is not a uniquely defined notion, and other 
representations similar to (|2.3p but with different distributions for the conditional 
variance of the Bernoulli component, i.e., for S{t), can also be obtained. In the 
following construction its uniform positivity may be lost but one gains the feature 
that the range of values which S assumes reaches up to the diameter of the support 
of the measure /i. 

Theorem 2.2. Let X be a non-degenerate real-valued random variable with prob- 
ability distribution fi. Then, for each p G (0,1), X admits a decomposition of the 
form: 

X ^ Yp{t) + S^{t) 77 (2.7) 

where t, rj and the function Yp are as in Theorem \2.1[ satisfying the above (1) and 
(2), but instead of (3) and (4) the following holds 

(3') 5p : (0, 1) I— > [0, oo) is the non- increasing function: 

6p{t) :=G(l-pt)-G((l-p)t), (2.8) 
(4') for any x- < xj^ and p± > such that 

P({X < a;_}) >p_ and P ({X > a;+}) > p+ , (2.9) 
at the particular value p — ^ ^^^^ we have 

Ft{{S-{t)>x+-x^}) > p^+p+, (2.10) 
where the probability is with respect to the uniform random variable t. 

In the proofs we employ two versions of what is called here the Pac-Man algo- 
rithm for the construction of a joint distribution p of a pair of random variables, of 
the form {Yi{t),Y2(t)}, whose marginal probability measures, pi, p2, satisfy 

p = {l~p)pi + PP2- (2.11) 

The representations (|2.3p and (|2.7p correspond to letting: 

Yp{t) := Yi(i) 

S^{t) := Y2it)-Y^it). (2.12) 
The two Theorems will be proven in reverse order. 

Proof of Theorem We start by recalling the known observation that for any 
continuous function (j) £ C(R): 

0(G(s))ds = / (f>{x)dp{x) . (2.13) 
Jr 

This relation allows to represent, in terms similar to (j2.7p . as 

X ^ G{t), (2.14) 
with t the random variable with the uniform distribution in [0, 1]. 
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Extending the above representation, we now define a pair of coupled random 
variables through the following functions of < G [0, 1]: 

Yi(i) G{il-p)t) 

Y2{t) := G{l-pt) (2.15) 

As is the case of G in (j2.14|) . the functions Yi and Y2 are made into random 
variables by assigning to them the joint probability distribution which is induced 
by Lebesgue measure on [0, 1]. Their marginal distributions satisfy (|2.11|1 . since for 
any continuous function (p G C(K) 

{l-p) [ HY,{t))dt + p f HY2{t))dt 







(l-p) / <j>{G{{l-p)t))dt + p / <j>{G{l-pt))dt 
Jo Jo 

f \ [ 4>{G{s))ds = [ (j){x)d^i{x) (2.16) 

where the last equality is by (|2.13p . 

By (j2.16|) the random variable seen on the right side of (|2.7p has the same 
distribution as X. The statement (2) readily follows from the definition (|2.15p and 

For a proof of (4') we note that (|2.9p is equivalent to 

G(p_) < x^ and G(l-p+) > x+ . (2.17) 
This implies 6p{t) > x+ — X- for all t < p+ + p^, and hence (|2.10[) holds true. □ 

In the above proof, one may regard the functions Yi (t) and Y2 (t) defined by (|2.15[) 
as describing the motion of a pair of markers which move along R consuming the 
/i- measure at the steady rates of [1 — p) and p, correspondingly. The markers leap 
discontinuously over intervals of zero ^-measure and, conversely, linger at points of 
positive mass. Their motion invokes the image of a linear version of the Pac-Man 
game, and hence we shall refer to the construction by this name. Whereas in the 
above construction the Pac-Men move towards each other, we shall next use the 
Pac-Man algorithm with one marker chasing the other. 



Proof of Theorem \2.1\ For the representation ()2.3|) we shall employ the following 
variant of (|2.15p : 

yi(t) := G{{l-p)t) 

Y2{t) := G{l-p + pt) (2.18) 

In this case, both Yi and Y2 are monotone non-decreasing in t and 

Yi{t) < G(l - p) < G(l - p + 0) < Y2{t) (2.19) 

for all t G (0, 1), where G(l — p + 0) = limj^o G{1 — p + e). Moreover, for any 
T G (0, 1) wc have the lower bound 

/?+ (p, /i) > min {G(l - p) - Yi (T), Y2(r) - G(l - p + 0)} , (2.20) 

since 

.+ (0>|^(^-^)-^^(^) ■^''<'^^^ (2.21) 

^ lY2(r)-G(l-p + 0) ifT<i<l. ^ ' 



6 MICHAEL AIZENMAN, FRANgOIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL 

For a sufficient condition for the uniform positivity of Sp{t) = Y2{t) — Yi{t) let 
us consider the arrival/departure times: 

T-L = inf{t e (0, 1) : Yi{t) ^ G(l - p)} (arrival time of Fi) , 

T2 = sup{i e (0, 1) : Y2(t) ^G{l-p + 0)} (departure time of Y2) . 

The times Ti , T2 are non-random and depend on p and p only. If 

Ti > , (2.22) 

then for each T e (T2, Ti) we have 

/3+ {p, p) > min {G(l -p)-Y, (T), Y2{T) - G(l - p + 0)} > 0. (2.23) 

The collection of p € (0, 1) such that (|2.22p is not empty whenever the support of 
the measure includes more than one point. □ 

Remark 2.1. (i) Explicit lower bounds on (3~^ . For the Bernoulli decompo- 
sition which is presented in Theorem 12.11 (i.e., based on the 'chasing Pac-Men' 
algorithm) , an expression for the lower bound /3+ (p, p) in terms of the distribution 
function of /i is given in p.34p below. A simple lower bound can be obtained in 
terms of just the "half-time" points for the two markers, i.e., from (|2.20p with 
T = ' 



2 ■ 

'1-p 



(3 (p, /i) > min 



G(l-p)-G 



G[—^\-G{l-p) 



(2.24) 



This shows that for continuous measures /i one has P'^{p,fi) > 0, i.e., (|2.6p . for any 

pe (0,1). 

If the support of fi consists of exactly two points the representation (|2.7p is 
trivially available, though at a unique value of p e (0,1)- If the support of fi 
contains more than two points, there exists at least one i € R such that 

u ((— 00, xl) < /i ((— 00, xl) if x<x, 

,] (2-25) 

< ((^00, x)) < ji ((—00, x\) < \ . 
At the particular value p = 1 — /i ((—00, x)) we then have G(l — p) = x and 

/3"*"(p,m) >m.m{x-G{(l-p)t),G{\-p + pt)-x} >0 

p{\x\) (2.26) 
for each t such that -^^^^ <t<l. 

P 

(ii) An alternative form. For another form of a Bernoulli decomposition, with 
a binary random variable cr = ±1, let 

cr = 2?7 - 1 and W ^ Yp + ^6+ . (2.27) 

When such a substitution is made in (|2.3p the two resulting functions W{t) and 
Sp{t) are monotone non-decreasing in t and S^{-) is constant over each interval of 
constancy of W{-). It follows that the value of 6p{t) can be expressed in terms of 
W{t), and thus one obtains a representation of the form: 

X = W + b{W)a, (2.28) 

with W and a independent random variables, and b{-) a measurable function which 
is determined by n and p. 
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(iii) Some precedents. As was commented above, the Bernoulli decomposition 
of Theorem 12.21 with p = 1/2 has appeared aheady in a work of A. N. Kolmogorov 
[Koj . For random variables with values in Z, the representation (|2.3p of Theo- 
rem [2TT] is related to the somewhat similar representation (though with S = 0, 1, 
not necessarily positive) which D. McDonald showed to be useful for the analysis 
of local limit theorems for integer random variables ( |Mj ) . 

2.2. Optimality of the Pac-Man algorithm. In applications of the decompo- 
sition it is desirable to maximize the conditional variance of the binary term. We 
shall now address related questions from an optimal transport perspective, and in 
particular establish optimality, in a certain limited sense, of the 'chasing Pac-Men' 
construction. 

In addition to the explicit choices presented in Theorems 12.11 and 12.21 there are 
other possibilities for a Bernoulli decomposition of the form (|2.3p . With a change 
of variables as in (|2.12p . such representations can alternatively be expressed in 
terms of joint distributions of the variables Yi, I2 with the properties listed in the 
following definition. 

Definition 2.1. A (1 — p,p) Bernoulli decomposition of a probability measure /i 
on K. is a probability measure p{dYi dY2) on whose marginals pi and p2 satisfy: 

(l-p)pi + PP2 ^ p. (2.29) 

This concept can of course be easily generalized to variables with values in R'^, 
or C. For real variables the defining condition p.29p is conveniently expressed in 
terms of the distribution functions, as 

il-p)Fi{x) + pF2{x) = F{x) (2.30) 

where F{x) — /i((— 00, x]), and Fj{x) = p{{Yj < x}) for j = 1,2. 

For each Bernoulli decomposition of a probability measure on M we denote: 

[^l] iP,P) esspj ™P I (Y2-Y1). (2.31) 

Theorem 2.3. For any (1— p,p), among all the Bernoulli decomposition of a given 
probability measure p on R; 

(1) The minimal conditional variation f3^{p, p) is maximized by the 'chasing 
Pac-Men' algorithm which is presented in the proof of Theorem \2.1[ i.e. for 
any Bernoulli decomposition 

PMp) < /3+(p,/i) :-essinfie(04)5+(i), (2.32) 

where ess inffg(o,i) yields the same value as inftg(o.i). 

(2) The maximal conditional variation ^'^{p^p) is maximized by the 'colliding 
Pac-Men' algorithm of Theorem \2.2[ for which (3'^{p,p) equals the diameter 
of the essential support of p. 

The equality: essinftg[o,i]'^i^(t) — iiifte[o,i]'5[t(t) is a simple consequence of the 
left-continuity property of the chasing Pac-Men algorithm, where Yj{t) =Yj{t— 0) 
and hence also 6^{t) = S~^{t — 0). 

To prove ()2.32p let us first establish a helpful expression for P^{p, p). Denoting 
by F^ the distribution functions corresponding to Yi and Y2 of (|2.18p we have: 
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Lemma 2.1. For the 'chasing Pac-Men' construction, of Theorem \2.1\ 

F+(a;) min{F(a;), 1 -p} , = i max {F(a;) + p - 1, 0} (2.33) 
l-p p 

and 

f3+ {p, p) = sup {b e R : F+ {x) > F+ {x + b) forallx£R}. (2.34) 

Proof. The statements (I2.33P follow directly from the definition of the Pac-Man 
process (|2.18l) . In the derivation of (|2.34p . we shall use the fact that for all t g (0, 1) 
and e > 0: 

^/(^^W - < * ^ F^i^J^ii)) (2-35) 
Let 5 := sup {& e R : F+{x) > F+{x + h) for all x G M}. Clearly, for any u > 
S, there is x G IR such that 

Fi{x) < F2{x + u). (2.36) 

It follows that for any t € {Fi{x),F2{x + u)): 

Y2{t) < x + u and Yi{t) > x, (2.37) 

and therefore 5+{t) = Y2{t) - Yi{t) < u. Thus: infte(o,i)^+(t) < S. 

For the converse direction, let us note that due to the monotonicity of F the 
condition on b in (|2.34p is satisfied by all u < S. Thus, if w < S', then, for all x S M: 

F+{x + u) < F+{x-0), (2.38) 

and hence for any t £ (0, 1): 

^2+ (n+ (t) + < ^1+ (n+ (0 - 0) < t, (2.39) 
which implies that Y^ [t] + u < Y^ (t) . Therefore 

Jnf^^{Y+it)-Y+{t)) > u. (2.40) 

It follows that inft£[o,i](5+(t) > S, which completes the proof of (|2.34p . □ 



Proof of Theorem \2.3\ . The second assertion is an elementary consequence of 
To prove (1) we shall show that for any 6 > /3'*"(p, p) it is also true that b > /3*(p, p). 

The condition ((OO)) readily implies that (1 ~p)Fi(m) < F(m), or Fi{x) < 
min {(1 — p)~^F{x), l}, and hence 

Fi{x) < F+{x) 

F2{x) > F+{x). (2.41) 

Now, by Lemma [2.11 for any b > (3^{p,fj.) there exist some t,u £ R, such that 
F+(u) =t < F2{u + b) and therefore, due to (|2.4ip . also 

Fi{u) <t < F2{u + b). (2.42) 

Eq. means that p{{Yi < u}) < t and p{{Y2 > u + b}) < 1 - t. Since the 

probabilities of the two events add to less than 1 the complement of their union is 
of positive probability, and this implies: 

p{{Y2~Yi<b}) > 0, (2.43) 

and hence & > /3* {p, p) . This concludes the proof of (|2.32[) . □ 
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Remark 2.2. The idea of seeking optimal joint realizations of random variables 
with constrained marginals has allowed to present a wide range of analytical results 
from a common 'optimal transport' perspective (see, e.g., 0). The most familiar 
variants of the problem concern couplings which minimize a distance function be- 
tween the two coupled variables. As our discussion demonstrates, it may also be of 
interest to seek couplings which maximize the difference between the two variables 
with constrained marginals. 

3. Concentration Bounds 

We shall now demonstrate how the Bernoulli decomposition yields probabilistic 
bounds from combinatorial results. If there is any novelty in this section it is 
in the formulation of the bounds for the non-linear the two main ideas 

were noted before in the context of linear functions: P. Erdos |Er] observed that 
concentration bounds for linear functions of Bernoulli variables can be derived from 
the combinatorial theory of E. Sperner [S], and B. A. Rogozin |R1| has used the 
Bernoulli decomposition of A. N. Kolmogorov [Ko] for the further extension of these 
bounds to arbitrary random variables. 

First, we present some essentially known results of Sperner theory; in the second 
subsection these results will be combined with the Bernoulli decomposition to yield 
concentration bounds for functions of independent random variables. 

3.1. Probabilistic Sperner Estimates. The configuration space {0,1}^ for a 
collection of Bernoulli random variables r] = {rji, 77jv} is partially ordered by the 
relation defined by: 

ri-KTl' ^ for all i e {1,...,7V} : m < v'i ■ (3-1) 

A set A C {0, 1}^ is said to be an antichain if it does not contain any pair of 
configurations which are compatible in the sense of The original Sperner 

Lemma states that for any such set: \A\ < ([»]). A more general result is the LYM 
inequality for antichains (cf. |Anj): 

Et^ ^ 1' (3-2) 
■qeA \\v\' 

where \ri\ =J2Vj- 

The LYM inequality has the following probabilistic implication. 

Lemma 3.1. Let {rjj} be independent copies of a Bernoulli random variable rj with 

P({77 = l})=p, P({77 = 0}) = g:= (3.3) 

where p £ (0, 1). Then for any antichain A C {0, 1}^.' 

Pi{ri£A}) < — (3.4) 



where r] = (771, . . . , ?7jv)j f,, = y^pq is the standard deviation of rj, and Q is an 
independent constant which does not exceed 2^/2. 

Proof. Let Ak be the subset of A consisting of configurations with jr/j ~ k. Then: 
P({77e^})-^/<Z^-'=|A| = E^(^;^'P)w^, max 6(fc;A^,p), (3.5) 

^"^^ ^"^^ i , 1 k—0,l,...,N 
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where b{k;N,p) := p'^q^ ''CI) the binomial distribution, and the inequahty is 
by (13. 2|) . The maximum of b{k; N,p) over k, which is known to occur near k = pN 
(cf. [H Theorem 1 on p. 140]) yields (EH). □ 



The bound ()3.4|) has the virtue of being valid for all N; for iV oo it holds with 
a smaller constant which tends to the asymptotic value Q — )■ 1/^/2tt (implied by 
p.Sp and Stirling's formula). 

Following is an extension of Lemma l3.1l to the case of non-identically distributed 
random variables. 

Lemma 3.2. Let r] — (ryi, . . . , tjn), where {rjj} are independent Bernoulli random 
variables with possibly different values of pj, and set 

a:— min min |p,-, 1 — p,} e (0, 1/21 . (3.6) 

j=l,2,...,N 

Then, for any antichain A C {0, 1}^.' 

P{t, e ^} < , (3.7) 



where Q is an independent constant which does not exceed 4. 

The proof gives us the chance to introduce the technique of 'double sampling'. 

Proof. We start from the observation that any Bernoulli variable 77 with parameter 
Prj as in (|3.3p may be decomposed in terms of two independent Bernoulli variables 
X and ^ as 

ri = ex, (3.8) 

with p^ p^ = Ptj- 

By the definition of a, eq. p.6p . pj E [a.l — a] for all j — 1,2, . . . , N. Hence 
the variables tj may be represented as in (|3.8p with independent identically dis- 
tributed (iid) Bernoulli variables {xj} with common p^ := 1 — a. We abbreviate 
this representation as := (fiXi, • ■ • ,£,nXn)- Evaluating the probability by first 
conditioning on the values of ^, one has 

P{veA}=E[F{^xeA\^}] (3.9) 

For specified values of the variables x : the event A depends only on the values of 
Xj with j in the set := {j : 7^ 0}, and as such it is an antichain in {0, 1}'^«. 
Bounding its conditional probability by Lemma |3. II we obtain 



P{lxG^II}<min|l, (3.10) 



where cr^ ~ \J Q!(1 — a) is the common standard deviation of Xi ■ 

To conclude the proof of (|3.7p it remains to estimate the expected value of 
the right hand side of ()3.10p . where |J^| — J^jLi^j- Noting that E{£_j) = p^^ = 
Pj/{1 — a) > a/{l — a), we see that the mean satisfies: 



(3.11) 
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The event {| J^l < aN /2{1 — a)} is of exponentially small probability, as can be 
seen by a standard large deviation estimate for independent variables. It then 
readily follows that 

E|min<;i, < (3.12) 




with a constant for which elementary estimates yield < 4. □ 

Remark 3.1. The above notions and results have natural extensions to integer 
valued independent random variables, r = (ti, r2, . . . , tjv), whose configuration 
space, Z^, is also partially ordered by the natural extension of the relation (|3.ip . 
The Bernoulli decomposition (|2.7p can be used for an extension of the probabilistic 
bound of Lemma l3.2l to this more general case. One way to derive the general state- 
ment is through the application of the bound (|3.7p to the conditional probability 
for the Bernoulli component, as in the arguments which appear below. Alterna- 
tively, one may note that the statement directly follows from Theorem 13.11 which is 
presented in the next section. 

For completeness it should be added that in addition to the anti-concentration 
upper bounds it is of interest to know the asymptotic behavior. That is covered by 
known results, such as is presented in Engel (EJ Theorem 7.2.1]: 

lim cr^%/2^ i max P{t G yt} I = 1 , (3.13) 

JV— oo [^C{0,l,---,fc}" antichain J 

which amounts to a 'local' central limit theorem (CLT). 

3.2. Concentration Bounds for Functions of Independent Random Vari- 
ables. We shall now employ the Bernoulli decomposition of Section [51 along with 
the results presented in the previous subsection, for an upper bound on the con- 
centration probability 

Qz(0 -supP({Ze [a;,x + e]}) (3.14) 
for random variables of the form 

Z = $(Xi,X2,...,XAr), (3.15) 

where {Xj} are independent random variables. 

Theorem 3.1. Let X = {Xi, . . . , Xjq) he a collection of independent random vari- 
ables whose distributions satisfy, for all j G {1, N}: 

P {{Xj < x^}) > p_ and P {{Xj > x+}) > p+ (3.16) 

at some p± > and < x^, and $ : i— > M a function such that for some 

$(m + wcj) - $(m) > e (3.17) 

for all V > x^ — X- , all u E M^, and j — 1, . . . , N , with Sj the unit vector in 
the j -direction. Then, the random variable Z which is defined by (|3.15p obeys the 
concentration bound 

4 n F 

Qzis) < _./— + — , (3.18) 



]j P+ P 

where 4 can also be replaced by the constant O of (|3.7I 
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Proof. We start by selecting p E (0,1) by the condition p = ^ . Next, we 
represent the variables {Xj} using Theorem 12.21 

X = Y{t)+6{t)r]:^ {Yp,i{ti) + S-^{h) ■ ■ ■ ,Yp^N{tN) + 6-^{tN) Vn) , 

(3.19) 

with Tj = (771, ... , tin) a collection of iid Bernoulli variables taking values {0, 1} with 
probability {1 — p,p}. From (|2.10p one may conclude that for all j e {1, . . . , N}: 

Vti{Sp^j{t) >x+-x-})>p++p.. (3.20) 

We express the probability of the event {Z e [x, x+e]} through first conditioning 
on the {tj} variables. For all a; G M: 

¥{{Z e[x,x + e]}) =E[V{At\t)] (3.21) 

where 

At ~ {r/ e {0,1}^ : <^>{Y{t) + S{t)r]) G [x,x + e]} . (3.22) 

By virtue of ((3?T7l) . the set At is an antichain in its dependence on {»7j}jeJt with 
Jt := {j : Sj{tj) > x^ — x^}. Lemma |3. II thus yields 

P {At 1 1, {Vi}j^.h} < min J 1, ^ \ (3.23) 



with (T^ = ■\/p(l — p). We conclude by the large-deviation argument used in the 

proof of Lemma [3T2l Using (|3.20p the expected value of | Jt | — J2f=i ^{j ■■Sj{tj)>x-^-x-} 
is bounded below: 

E{\Jt\)>ip++p-)N. (3.24) 

Therefore {\Jt\ < ^ip+ + P-)N} is a large deviation event and its probability is 
exponentially bounded. Elementary estimates lead to 



Elminil, ^^}] <-%. — + —, (3.25) 

with the same constant 8 as in (|3.7p . □ 

Remark 3.2. (i) A simpler proof for iid variables. For iid non-degenerate 
random variables Xi, . . . , Xn the theorem has a simpler proof using the binary de- 
composition of Theorem l2.H there is no need for the large deviation argument. The 
constants in the theorem will then depend on the value of p and its corresponding 
lower bound in (|2.6[) . 

(ii) The linear case. For linear functions, 

N 

Z-$(Xi,...,X^) (3.26) 

concentration inequalities as in (|3.18[) go back to W. Doeblin, P. Levy [DoL[ [Do] . 
P. Erdos [Erj (for the Bernoulli case, where it reduces to the Littlewood-Offord 
problem), A. N. Kolmogorov [Koj, B. A. Rogozin [Rl], H. Kesten [K] and C. G. 
Esseen |Es) . In this case, sharper inequalities than (|3.18p are known, e.g. |R3| . 

N 
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where Q is some constant. A recent application of the discrete case of the concen- 
tration bounds is found in [TVj . 

(iii) An extension. As it is aheady true for (|3.27p . the statement of Theorem l3.ll 
has an immediate extension to functions which in some variables are monotone 
increasing and in some are monotone decreasing, satisfying the natural analog of 
(|3.17p . For this extension, one only needs to replace p+ and p_ in (|3.18p by p = 
min{p4-,p_}. 

(iv) Sperner bounds from concentration inequalities. In the proof of Theorem l3.1l 
concentration bounds were deduced from the probabilistic Sperner estimate (j3.4l) . 
For antichains in the multiset S = {0,1, . . . , K}'^ the implication can also be 
established in the opposite direction. For that, one may use the fact that in such 
a multiset for any antichain A there is a function $ : S" i— > R which satisfies the 
'representation condition' (in the terminology of [EJ) 

$(u + ej)-$(M) > 1 (3.28) 

and for which <f>(it) = if and only in u £ A. 

4. An Application to Random Schrodinger Operators 

As a demonstration of a possible uses of the elementary observations which are 
made in this article, let us present the case of spectral localization under random 
iid single site potential for an arbitrary probability distribution. 

The (continuum) Anderson Hamiltonian is the random Schrodinger operator 
given by 

ff^ = -A + K; on L\W^), (4.1) 

with 

K.(a;)= ^c.,.^.(x-0, (4.2) 

where 

(1) u(-), the single site potential, is a nonnegative bounded measurable func- 
tion on M.'^ with compact support, uniformly bounded away from zero in a 
neighborhood of the origin, 

(2) LJ = {i^jl^gzd is a family of independent identically distributed random 
variables, whose common probability distribution fi satisfies: {0,M} S 
supp^ C [0, M], for some M > 0. 

The random operator is a function of a;, and as such it is defined over a 
probability space which is invariant under the ergodic action of the group of Z'' 
translations. The induced maps on this operator valued function are implemented 
by unitary translations. 

Ergodicity considerations carry the implication that there exist fixed subsets of M 
so that the spectrum of the self-adjoint operator H^^ , as well as its pure point (pp) , 
absolutely continuous (ac), and singular continuous (sc) components, are equal to 
these fixed sets with probability one (c.f. [Pl IKuSilKiMj ). In the case of the random 
potential (|4.2p . the positivity of u(-) and the support properties of fi imply that 



a{H^) = [0,oo). 



(4.3) 
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Although definitions of locahzation may come in several flavors, they all include 
(or imply) spectral localization (i.e., pure point spectrum), as given in the following 
definition. 

Definition 4.1. A self-adjoint operator H on L^(R'') is said to exhibit spectral 
localization in a closed interval / C K if o'(iJ) n / ^ and the corresponding 
spectral projection Pi{H) is given by a countable sum of orthogonal projections on 
proper eigenspaces. 

This property is clearly invariant under translations. The defining condition is 
equivalent to the requirement that for a spanning set of vectors the spectral measure 
is pure-point within /. The set of lo for which this holds for the random operator 
is known to be measurable. 

In the one-dimensional case the continuous Anderson Hamiltonian has been long 
known to exhibit spectral localization in the whole real line for any non-degenerate 
/i, i.e. when the random potential is not constant [GoMPi |DSS| . In the multi- 
dimensional case, localization at the bottom of the spectrum is already known at 
great, but nevertheless not all-inclusive, generality; cf. [Stl [KJ IBK| and references 
therein. The Bernoulli decomposition presented here allows to prove localization 
for general non-degenerate single site distributions fi. 

More explicitly, the simplest case to deal with, for the different approaches which 
yield proofs of localization, has been when the single site probability distribution 
is absolutely continuous with bounded derivative. The absolute continuity condi- 
tion can be relaxed to Holder continuity of both in the approach based on the 
multiscale analysis which was introduced in [FrS| and is discussed in |Klj . and in 
the one based on the fractional moment method of |AM|, |AE-|-| . (The basis in the 
former case is an improved analysis of the Wegner estimate, which can be found 
in [Stl ICHK] .) However, techniques relying on the regularity of /i seem to reach 
their limit with log-Holder continuity. In particular, until recently the Bernoulli 
random potential had been beyond the reach of analysis in more than one dimen- 
sion. For that extreme case, i.e., of H^^ with /x{l} = ij,{0} = ^, localization at 
the bottom of the spectrum was recently proven by Bourgain and Kenig [BK| . A 
crucial step in the analysis of [BKj is the estimation of the probabilities of energy 
resonances using Sperner's Lemma, i.e., the p = ^ version of (|3.4p . 

The point which we would like to make here is that the Bernoulli decomposition 
of random variables enables one to turn the latter result of Bourgain and Kenig 
|BK| into a tool for a general proof of localization at the edge of the spectrum for 
arbitrary non-degenerate /i. 

First, the Bourgain and Kenig jBKj analysis needs to be extended to Schrodinger 
operators which incorporate an additional background potential U G L°°{M.'^), and 
for which the variances of the Bernoulli terms are uniformly positive, thought not 
necessarily uniform. More explicitly, the class is broadened to include operators of 
the form 

Hr,^-A + Uix)+Y,mhu{x-0, (4.4) 

where u{-) is as in (14. 2p . satisfying the above condition (1), but instead of (2): 
(2') r] ~ {v^y^eZ'^ are iid Bernoulli random variables taking the values {0, 1} 
with probabilities {1 —p,p}, and the coefficients {fc^l^ezt* satisfy 

< b- < &e < 6+ < oo for aU ^ e Z'^, (4.5) 
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and 

(3) U e i°°(M'*) satisfies, for all xeW^: < U{x) <U+ <oo. 
Due to the presence of the background potential U the spectrum of Hrj need not 
be deterministic, i.e., equal to some fixed set with probability one. For our main 
purpose it would suffice to restrict attention to U for which the spectrum of Hr/ 
is almost surely [0,oo). Such restriction is not included in the following statement 
but instead there is a caveat in the conclusion. 

The extended BK result, whose proof is presented in |GKj . is: 

Theorem 4.1. Given a function u{-) as above, and: p G (0,1), b± > and 
[/+ < oo, there exist Eq > such that any random operator H^i of the form (|4.4p . 
satisfying conditions (1), (2') and (3), for otherwise arbitrary external potential 
U, with probability one, either exhibits spectral localization in [Q,Eq\ or cf{Hn) fl 



Theorem 12.11 allows now to deduce the following general statement from the 
above non-trivial Bernoulli result. 

Theorem 4.2. Let H^^ — —A + V^^ be a Schrodinger operator with the random 
potential given by (|4.2p . satisfying the above conditions (1) and (2). Then for some 
Eq > the operator H^, with probability one, exhibits spectral localization in [0, Eq\. 



Proof. The Bernoulli decomposition (|2.3p allows to write the coefficients in the 
random potential in the form: 

'^^{y+ik)+5t{k)Vi]^^j^.. (4.6) 

with t = {ijl^gzrf a family of independent random variables which are uniformly 
distributed in (0,1), 1+ and 5^ the functions defined in (I2.12p in terms of the 
distribution function of /i, and = {^{IfeZ'' ^ family of iid Bernoulli variables, 
independent oft, which take values in {0, 1} with probabilities {1 —p,p} for some 
p e (0, 1) such that holds. 

As a consequence, the random operator can be written as: 

H^^ -A + Ut + Vt,r,^: Ht,r, (4.7) 

where 

Ut{x) := ^+(*S) "(^ - and Vt,r,{x) := ^ <5+(<5) ry^ u{x - , (4.8) 



< oo . 



and the following bounds hold 



< Utix) < U+ := M 



< 6_ := inf St it) < b+ := Af < oo . (4.9) 

te(o,i) 

This implies that when conditioned on the values of t the operator Htn is of the 
form (|4.4p . with p, U+ and b± independent of t. Thus, by Theorem 14.11 there 
exists Eq > Q such that when conditioned on t with probability one, Ht rj either 
exhibits spectral localization or has no spectrum in [0, i?o]. However, the latter is 
excluded (almost surely, also with respect to the conditional probability) by (|4.3p 
and Fubini. □ 



16 MICHAEL AIZENMAN, FRANgOIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL 



Remark 4.1. In addition to the spectral localization it is also of interest to estab- 
lish the existence of uniform localization length, i.e., to prove that all eigenfunctions 
(j) of with eigenvalue in [0, Eq] satisfy 

/ < C0e-2|^l/^ for all x e R'^ . (4.10) 

J\x-v\<k 

This can be accomplished in the following two ways, for which the details are 
presented in [GK| . 

To establish uniform localization length under the hypotheses of Theorem 14.21 
one may use the Bernoulli decomposition (|4.6p before performing the multiscale 
analysis which is behind the proof of Theorem l4.1l The multiscale analysis is then 
executed for the random Schrodinger operator Ht ^i in (|4.7p . in such a way that all 
events in the analysis are jointly measurable in t and ry. 

An alternative proof of Theorem 14.21 which yields also uniform localization 
length, can be based on the concentration bound of Theorem 13.11 Namely, the 
Bourgain-Kenig proof can be extended to arbitrary single site probability distribu- 
tion /i, with the probabilities of energy resonance estimated by the concentration 
bound instead of by Sperner's Lemma as in [BKj (see [GKj ). 
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